{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-colab"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/pathwaycom/pathway/blob/main/examples/notebooks/tutorials/json_type.ipynb\" target=\"_parent\"><img src=\"https://pathway.com/assets/colab-badge.svg\" alt=\"Run In Colab\" class=\"inline\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "notebook-instructions",
      "source": [
        "# Installing Pathway with Python 3.10+\n",
        "\n",
        "In the cell below, we install Pathway into a Python 3.10+ Linux runtime.\n",
        "\n",
        "> **If you are running in Google Colab, please run the colab notebook (Ctrl+F9)**, disregarding the 'not authored by Google' warning.\n",
        "> \n",
        "> **The installation and loading time is less than 1 minute**.\n"
      ],
      "metadata": {}
    },
    {
      "cell_type": "code",
      "id": "pip-installation-pathway",
      "source": [
        "%%capture --no-display\n",
        "!pip install --prefer-binary pathway"
      ],
      "execution_count": null,
      "outputs": [],
      "metadata": {}
    },
    {
      "cell_type": "code",
      "id": "logging",
      "source": [
        "import logging\n",
        "\n",
        "logging.basicConfig(level=logging.CRITICAL)"
      ],
      "execution_count": null,
      "outputs": [],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "id": "2",
      "metadata": {
        "lines_to_next_cell": 0
      },
      "source": [
        "# Handling JSON in Pathway\n",
        "\n",
        "JSON is a widely used format for data interchange due to its simplicity and readability. Upon finishing this article, managing JSON in Pathway should become effortlessly intuitive.\n",
        "\n",
        "As an example, we'll use JSON objects loaded directly from python list. However, JSON data can come from various sources that support this format, such as [Kafka](/developers/user-guide/connect/connectors/kafka_connectors) or an [HTTP connector](/developers/api-docs/pathway-io/http#pathway.io.http.rest_connector)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "3",
      "metadata": {},
      "outputs": [],
      "source": [
        "\n",
        "rows = [\n",
        "    (\n",
        "        1,\n",
        "        {\n",
        "            \"author\": {\"id\": 1, \"name\": \"Haruki Murakami\"},\n",
        "            \"books\": [\n",
        "                {\"title\": \"Norwegian Wood\", \"year\": 1987},\n",
        "                {\n",
        "                    \"title\": \"Kafka on the Shore\",\n",
        "                    \"year\": 2002,\n",
        "                    \"category\": \"Literary Fiction\",\n",
        "                },\n",
        "            ],\n",
        "        },\n",
        "    ),\n",
        "    (\n",
        "        2,\n",
        "        {\n",
        "            \"author\": {\"id\": 2, \"name\": \"Stanis\u0142aw Lem\"},\n",
        "            \"books\": [\n",
        "                {\"title\": \"Solaris\", \"year\": 1961, \"category\": \"Science Fiction\"},\n",
        "                {\"title\": \"The Cyberiad\", \"year\": 1967, \"category\": \"Science Fiction\"},\n",
        "            ],\n",
        "        },\n",
        "    ),\n",
        "    (\n",
        "        3,\n",
        "        {\n",
        "            \"author\": {\"id\": 3, \"name\": \"William Shakespeare\"},\n",
        "            \"books\": [\n",
        "                {\"title\": \"Hamlet\", \"year\": 1603, \"category\": \"Tragedy\"},\n",
        "                {\"title\": \"Macbeth\", \"year\": 1623, \"category\": \"Tragedy\"},\n",
        "            ],\n",
        "        },\n",
        "    ),\n",
        "]"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "4",
      "metadata": {
        "lines_to_next_cell": 0
      },
      "source": [
        "Each JSON object carries information about an author and their associated books. To load it, let's establish a [schema](/developers/user-guide/connect/schema#understanding-data-types-and-schemas) reflecting the data's structure and then proceed to load this data into a table."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "5",
      "metadata": {
        "lines_to_next_cell": 2
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "            | key | data\n",
            "^X1MXHYY... | 1   | {\"author\": {\"id\": 1, \"name\": \"Haruki Murakami\"}, \"books\": [{\"title\": \"Norwegian Wood\", \"year\": 1987}, {\"title\": \"Kafka on the Shore\", \"year\": 2002, \"category\": \"Literary Fiction\"}]}\n",
            "^YYY4HAB... | 2   | {\"author\": {\"id\": 2, \"name\": \"Stanis\\u0142aw Lem\"}, \"books\": [{\"title\": \"Solaris\", \"year\": 1961, \"category\": \"Science Fiction\"}, {\"title\": \"The Cyberiad\", \"year\": 1967, \"category\": \"Science Fiction\"}]}\n",
            "^Z3QWT29... | 3   | {\"author\": {\"id\": 3, \"name\": \"William Shakespeare\"}, \"books\": [{\"title\": \"Hamlet\", \"year\": 1603, \"category\": \"Tragedy\"}, {\"title\": \"Macbeth\", \"year\": 1623, \"category\": \"Tragedy\"}]}\n"
          ]
        }
      ],
      "source": [
        "import pathway as pw\n",
        "\n",
        "# To use advanced features with Pathway Scale, get your free license key from\n",
        "# https://pathway.com/features and paste it below.\n",
        "# To use Pathway Community, comment out the line below.\n",
        "pw.set_license_key(\"demo-license-key-with-telemetry\")\n",
        "\n",
        "\n",
        "\n",
        "\n",
        "class InputSchema(pw.Schema):\n",
        "    key: int\n",
        "    data: pw.Json\n",
        "\n",
        "\n",
        "table = pw.debug.table_from_rows(schema=InputSchema, rows=rows)\n",
        "\n",
        "table"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "6",
      "metadata": {},
      "source": [
        "Pathway enables manipulation of JSON from two perspectives: expressions and [user-defined functions](/developers/api-docs/pathway#pathway.udf). Let's examine each one separately."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "7",
      "metadata": {},
      "source": [
        "## Working with JSONs using expressions\n",
        "\n",
        "### Accessing JSON fields\n",
        "\n",
        "A column of type [`pw.Json`](/developers/api-docs/pathway#pathway.Json) enables access to its attributes using the index operator (`[]`). This operator accepts an index in the form of a string for JSON objects, an integer for JSON arrays, or an expression evaluating to one of these types. If there's no element at the index or if the value is `pw.Json.NULL`, it returns `pw.Json.NULL`, making this operator convenient for chaining."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "8",
      "metadata": {
        "lines_to_next_cell": 2
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "            | author                | books\n",
            "^X1MXHYY... | \"Haruki Murakami\"     | [{\"title\": \"Norwegian Wood\", \"year\": 1987}, {\"title\": \"Kafka on the Shore\", \"year\": 2002, \"category\": \"Literary Fiction\"}]\n",
            "^YYY4HAB... | \"Stanis\\u0142aw Lem\"  | [{\"title\": \"Solaris\", \"year\": 1961, \"category\": \"Science Fiction\"}, {\"title\": \"The Cyberiad\", \"year\": 1967, \"category\": \"Science Fiction\"}]\n",
            "^Z3QWT29... | \"William Shakespeare\" | [{\"title\": \"Hamlet\", \"year\": 1603, \"category\": \"Tragedy\"}, {\"title\": \"Macbeth\", \"year\": 1623, \"category\": \"Tragedy\"}]\n"
          ]
        }
      ],
      "source": [
        "books = table.select(author=pw.this.data[\"author\"][\"name\"], books=pw.this.data[\"books\"])\n",
        "books"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9",
      "metadata": {},
      "source": [
        "Alternatively, a `get()` method can be used to access `JSON` attributes. This method allows defining a custom default value."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "10",
      "metadata": {
        "lines_to_next_cell": 2
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "            | author                | title            | category\n",
            "^X1MXHYY... | \"Haruki Murakami\"     | \"Norwegian Wood\" | \"Uncategorized\"\n",
            "^YYY4HAB... | \"Stanis\\u0142aw Lem\"  | \"Solaris\"        | \"Science Fiction\"\n",
            "^Z3QWT29... | \"William Shakespeare\" | \"Hamlet\"         | \"Tragedy\"\n"
          ]
        }
      ],
      "source": [
        "sample = table.select(\n",
        "    author=pw.this.data[\"author\"][\"name\"],\n",
        "    title=pw.this.data[\"books\"][0][\"title\"],\n",
        "    category=pw.this.data[\"books\"][0].get(\"category\", default=pw.Json(\"Uncategorized\")),\n",
        ")\n",
        "sample"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "11",
      "metadata": {},
      "source": [
        "### Converting to simple types\n",
        "\n",
        "`JSON` column can be converted into `Optional[T]` where `T` is one of the simple types, using methods: [`as_int()`](/developers/api-docs/pathway#pathway.ColumnExpression.as_int), [`as_str()`](/developers/api-docs/pathway#pathway.ColumnExpression.as_str), [`as_float()`](/developers/api-docs/pathway#pathway.ColumnExpression.as_float), [`as_bool()`](/developers/api-docs/pathway#pathway.ColumnExpression.as_bool)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "12",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "            | author\n",
            "^X1MXHYY... | HARUKI MURAKAMI\n",
            "^YYY4HAB... | STANIS\u0141AW LEM\n",
            "^Z3QWT29... | WILLIAM SHAKESPEARE\n"
          ]
        }
      ],
      "source": [
        "books.select(author=pw.unwrap(pw.this.author.as_str()).str.upper())\n"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "13",
      "metadata": {},
      "source": [
        "### Flatten\n",
        "\n",
        "You can utilize the [`flatten()`](/developers/api-docs/pathway-table#pathway.internals.table.Table.flatten) operator specifically on columns that contain JSON arrays. It's a useful tool when working with complex JSON structures."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "14",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "            | books                                                                         | author",
            "\n",
            "^Z3QVC46... | {\"title\": \"Hamlet\", \"year\": 1603, \"category\": \"Tragedy\"}                      | \"William Shakespeare\"\n",
            "^X1MQZF8... | {\"title\": \"Kafka on the Shore\", \"year\": 2002, \"category\": \"Literary Fiction\"} | \"Haruki Murakami\"\n",
            "^Z3QHRW2... | {\"title\": \"Macbeth\", \"year\": 1623, \"category\": \"Tragedy\"}                     | \"William Shakespeare\"\n",
            "^X1MGYPB... | {\"title\": \"Norwegian Wood\", \"year\": 1987}                                     | \"Haruki Murakami\"\n",
            "^YYYA47A... | {\"title\": \"Solaris\", \"year\": 1961, \"category\": \"Science Fiction\"}             | \"Stanis\\u0142aw Lem\"\n",
            "^YYY18MS... | {\"title\": \"The Cyberiad\", \"year\": 1967, \"category\": \"Science Fiction\"}        | \"Stanis\\u0142aw Lem\"\n"
          ]
        }
      ],
      "source": [
        "flat_list = books.flatten(pw.this.books)\n",
        "flat_list"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "15",
      "metadata": {
        "lines_to_next_cell": 2
      },
      "source": [
        "## JSON in UDFs"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "16",
      "metadata": {
        "lines_to_next_cell": 2
      },
      "source": [
        "Pathway enables manipulation of JSON using [user-defined functions](/developers/api-docs/pathway#pathway.udf). Just like with expressions, the index operator (`[]`) and methods allowing conversion into specific types are available.  It's crucial to note that this conversion is strict\u2014 attempting to convert incompatible data will result in an exception."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "17",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "            | title                | metadata\n",
            "^Z3QVC46... | \"Hamlet\"             | {\"century\": 17, \"title\": \"Hamlet\", \"year\": 1603, \"category\": \"Tragedy\"}\n",
            "^X1MQZF8... | \"Kafka on the Shore\" | {\"century\": 21, \"title\": \"Kafka on the Shore\", \"year\": 2002, \"category\": \"Literary Fiction\"}\n",
            "^Z3QHRW2... | \"Macbeth\"            | {\"century\": 17, \"title\": \"Macbeth\", \"year\": 1623, \"category\": \"Tragedy\"}\n",
            "^X1MGYPB... | \"Norwegian Wood\"     | {\"century\": 20, \"title\": \"Norwegian Wood\", \"year\": 1987}\n",
            "^YYYA47A... | \"Solaris\"            | {\"century\": 20, \"title\": \"Solaris\", \"year\": 1961, \"category\": \"Science Fiction\"}\n",
            "^YYY18MS... | \"The Cyberiad\"       | {\"century\": 20, \"title\": \"The Cyberiad\", \"year\": 1967, \"category\": \"Science Fiction\"}\n"
          ]
        }
      ],
      "source": [
        "@pw.udf\n",
        "def transform(data: pw.Json) -> pw.Json:\n",
        "    return {\"century\": (data[\"year\"].as_int()) // 100 + 1, **data.as_dict()}\n",
        "\n",
        "\n",
        "flat_list.select(title=pw.this.books[\"title\"], metadata=transform(pw.this.books))"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "18",
      "metadata": {},
      "source": [
        "Further details about `pw.Json` functionality are available in the dedicated [API documentation](/developers/api-docs/pathway#pathway.Json)."
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.11.11"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}